# Real-time speech processing

Ultravox V0 5 Llama 3 2 1b
MIT
A multilingual text-to-text model preloaded with meta-llama/Llama-3.2-1B-Instruct weights
Large Language Model Transformers Supports Multiple Languages
U
FriendliAI
211
0
Ultravox V0 5 Llama 3 2 1b ONNX
MIT
Ultravox is a multilingual audio-to-text model optimized based on the LLaMA-3-2.1B architecture, supporting speech recognition and transcription tasks in multiple languages.
Audio-to-Text Transformers Supports Multiple Languages
U
onnx-community
1,088
3
Segmentation 3.0
MIT
This is an audio segmentation model capable of detecting speaker changes, voice activity, and overlapping speech, suitable for audio analysis in multi-speaker scenarios.
Audio Processing
S
fatymatariq
1,228
0
Uzbek Stt 3
Apache-2.0
A fine-tuned Uzbek speech recognition model based on Oyqiz/uzbek_stt, specifically optimized for legal and military domain data
Speech Recognition Transformers Other
U
sarahai
157
3
Segmentation 3.0
MIT
This is a speaker segmentation model based on pyannote.audio, capable of detecting speech activity, speaker changes, and overlapping speech.
Audio Processing
S
tensorlake
387
1
Speaker Diarization 3.0
MIT
Speaker diarization pipeline trained on pyannote.audio 3.0.0, supporting automatic voice activity detection, speaker change detection and overlapping speech detection
Speaker Analysis
S
pyannote
463.91k
186
Wav2vec Fine Tuned Speech Command2
Apache-2.0
A speech recognition model fine-tuned on the speech_commands dataset based on facebook/wav2vec2-base, achieving 97.35% accuracy
Audio Classification Transformers
W
Thamer
16
0
Speechcommand Demo
Apache-2.0
A fine-tuned voice command classification model based on facebook/wav2vec2-base, trained on the superb dataset with an accuracy of 98.09%
Audio Classification Transformers
S
SHENMU007
18
0
S2t Small Mustc En Nl St
MIT
An end-to-end speech translation model based on S2T architecture, specifically designed for English-to-Dutch speech translation tasks
Speech Recognition Transformers Supports Multiple Languages
S
facebook
20
0
Wav2vec2 Large Xlsr 53 Greek
Apache-2.0
This is a Greek automatic speech recognition model based on the XLSR-Wav2Vec2 architecture, developed by the Hellenic Military Academy and the Technical University of Crete.
Speech Recognition Other
W
lighteternal
443
8
Sepformer Wham Enhancement
Apache-2.0
A toolkit for speech enhancement (denoising) using the SepFormer model, pre-trained on the WHAM! dataset (8kHz sampling rate version) to remove environmental noise and reverberation.
Audio Enhancement English
S
speechbrain
827
23
Sepformer Whamr Enhancement
Apache-2.0
This model achieves speech enhancement (denoising + dereverberation) through the SepFormer architecture, pre-trained on the WHAMR! dataset (8kHz), with a test set SI-SNR of 10.59dB.
Audio Enhancement English
S
speechbrain
570
11
S2t Small Mustc En Es St
MIT
A speech-to-text transformer model for end-to-end English to Spanish speech translation
Speech Recognition Transformers Supports Multiple Languages
S
facebook
20
0
Convtasnet Libri3Mix Sepnoisy 8k
A ConvTasNet model trained based on the Asteroid framework, designed to separate 3 independent audio sources from mixed audio, specifically optimized for noisy speech data at 8kHz sampling rate.
Sound Separation
C
JorisCos
33
2
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase